-
Notifications
You must be signed in to change notification settings - Fork 30
feat: StreamThreadException investigation spike for Bing Ads source #710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
devin-ai-integration
wants to merge
4
commits into
main
Choose a base branch
from
devin/1755287258-bing-ads-stream-thread-exception-spike
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
feat: StreamThreadException investigation spike for Bing Ads source #710
devin-ai-integration
wants to merge
4
commits into
main
from
devin/1755287258-bing-ads-stream-thread-exception-spike
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Document root cause analysis of UTF-8 decoding error with GZIP data - Identify issue in CompositeRawDecoder parser selection logic - Outline investigation areas and proposed fixes for concurrent source framework - Reference issue #8301 with campaign_labels stream error Co-Authored-By: unknown <>
- Create test script demonstrating StreamThreadException root cause - Reproduce exact error: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte - Test both failing scenario (missing Content-Encoding) and correct GZIP handling - Validate header-based parser selection in CompositeRawDecoder Co-Authored-By: unknown <>
- Add ImprovedCompositeRawDecoder with auto-detection of GZIP content - Detect GZIP magic bytes (0x1f 0x8b) when Content-Encoding header missing - Provide better error handling for UTF-8 decoding of GZIP data - Add recovery mechanism for StreamThreadException in campaign_labels stream - Create Bing Ads compatible decoder configuration Co-Authored-By: unknown <>
Original prompt from API User |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1755287258-bing-ads-stream-thread-exception-spike#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1755287258-bing-ads-stream-thread-exception-spikeHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
- Apply ruff formatting to test_gzip_utf8_issue.py - Apply ruff formatting to fix_gzip_parser_selection.py - Ensure code style compliance for CI checks Co-Authored-By: unknown <>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
StreamThreadException investigation and fix (spike, do not merge)
Summary
This spike PR investigates and proposes a fix for issue #8301 - a
StreamThreadExceptionin the Bing Ads source connector where thecampaign_labelsstream fails with:Root Cause: Byte
0x8bis the GZIP magic number, indicating that GZIP-compressed data is being incorrectly passed to a UTF-8 decoder. This occurs when theCompositeRawDecoder's parser selection logic fails to detect GZIP content (likely due to missingContent-Encodingheaders), causing compressed data to be treated as plain text.Proposed Solution: Enhanced
CompositeRawDecoderwith auto-detection of GZIP content by magic bytes, better error handling, and graceful fallback mechanisms.Review & Testing Checklist for Human
🔴 High Risk - 5 Critical Items
test_gzip_utf8_issue.pyin a proper environment to verify it reproduces the exact error (had import issues locally)CompositeRawDecoderor create new implementation - current proposal creates separate classcampaign_labelsstream to verify it resolves the issueCompositeRawDecoderRecommended Test Plan:
campaign_labelsstream to reproduce the errorDiagram
%%{ init : { "theme" : "default" }}%% graph TD Issue["Issue #8301<br/>StreamThreadException<br/>campaign_labels"] BingAds["airbyte/airbyte-integrations/<br/>connectors/source-bing-ads/<br/>manifest.yaml"]:::context CompositeDecoder["airbyte_cdk/sources/declarative/<br/>decoders/composite_raw_decoder.py"]:::context ConcurrentSource["airbyte_cdk/sources/concurrent_source/<br/>concurrent_read_processor.py"]:::context Investigation["SPIKE_INVESTIGATION.md"]:::major-edit TestScript["test_gzip_utf8_issue.py"]:::major-edit ProposedFix["fix_gzip_parser_selection.py"]:::major-edit Issue --> Investigation Issue --> TestScript Issue --> ProposedFix BingAds --> CompositeDecoder CompositeDecoder --> ConcurrentSource ConcurrentSource --> Issue Investigation --> CompositeDecoder TestScript --> CompositeDecoder ProposedFix --> CompositeDecoder subgraph Legend L1[Major Edit]:::major-edit L2[Minor Edit]:::minor-edit L3[Context/No Edit]:::context end classDef major-edit fill:#90EE90 classDef minor-edit fill:#87CEEB classDef context fill:#FFFFFFNotes
CompositeRawDecoder- integration approach needs decision